# Program verification under weak memory consistency

Viktor Vafeiadis

**MPI-SWS** 

Marktoberdorf, August 2019

# Axiomatic memory models

## **Executions**

#### **Events**

► Reads, Writes, Updates, Fences

## Relations

- ▶ Program order, po (also called "sequenced-before", sb)
- ► Reads-from, rf



## **Executions**

## Definition (Label)

A *label* has one of of the following forms:

$$R \times V_r$$
  $W \times V_w$   $U(x V_r V_w)$   $F$ 

where  $x \in \text{Loc}$  and  $v_r, v_w \in \text{Val}$ .

## Definition (Event)

An *event* is a triple  $\langle id, i, l \rangle$  where

- $ightharpoonup id \in \mathbb{N}$  is an event identifier,
- ▶  $i \in \text{Tid} \cup \{0\}$  is a thread identifier, and
- ▶ / is a label.

## Executions

## Definition (Execution graph)

An execution graph is a tuple  $\langle E, po, rf \rangle$  where:

- E is a finite set of events
- ▶ po ("program order") is a partial order on E
- ▶ rf ("reads-from") is a binary relation on E such that:
  - ▶ For every  $\langle w, r \rangle \in rf$ 
    - ▶  $typ(w) \in \{W,U\}$
    - ▶  $typ(r) \in \{R,U\}$

    - $ightharpoonup \operatorname{val}_{\mathbf{w}}(w) = \operatorname{val}_{\mathbf{r}}(r)$
  - ►  $rf^{-1}$  is a function (that is: if  $\langle w_1, r \rangle, \langle w_2, r \rangle \in rf$  then  $w_1 = w_2$ )

## Some notations

Let  $G = \langle E, po, rf \rangle$  be an execution graph.

- $ightharpoonup G.E \triangleq E$
- **▶** *G*.po \( \delta \) *po*
- $ightharpoonup G.rf \stackrel{\triangle}{=} rf$
- ▶  $G.R \triangleq \{r \in E \mid typ(r) = R \lor typ(r) = U\}$
- ▶  $G.W \triangleq \{w \in E \mid typ(w) = W \lor typ(w) = U\}$
- ►  $G.RMW \triangleq \{u \in E \mid typ(u) = U\}$
- ▶  $G.F \triangleq \{f \in E \mid typ(f) = F\}$
- $G.R_x \triangleq G.R \cap \{r \in E \mid loc(r) = x\}$
- **.**..

## Mapping programs to executions: Example





# Consistency predicate

Let X be some consistency predicate (on execution graphs)

## Definition (Allowed outcome under a declarative model)

An outcome O is *allowed* for a program P under X if there exists an execution graph G such that:

- ▶ *G* is an execution graph of *P* with outcome *O*.
- G is X-consistent.

#### Exception: "catch-fire" semantics

 $\dots$  or if there exists an execution graph G such that:

- ightharpoonup G is an execution graph of P.
- G is X-consistent.
- ► *G* is "bad".

## Completeness

The most basic consistency condition:

## Definition (Completeness)

An execution graph G is called *complete* if

$$codom(G.rf) = G.R$$

i.e., every read reads from some write.

# Sequential consistency

the result of any execution is the same as if the operations of all the processors were executed in some sequential order, respecting the order specified by the program [Lamport, 1979]

# Sequential consistency [Lamport]

#### Definition

Let sc be a total order on G.E. G is called SC-consistent wrt sc if the following hold:

- ▶ If  $\langle a, b \rangle \in G$ .po then  $\langle a, b \rangle \in sc$ .
- ▶ If  $\langle a,b\rangle \in G.$ rf then  $\langle a,b\rangle \in sc$  and there does not exist  $c \in G.$ W<sub>loc(b)</sub> such that  $\langle a,c\rangle \in sc$  and  $\langle c,b\rangle \in sc$ .

## Definition

An execution graph *G* is called **SC**-*consistent* if the following hold:

- ▶ *G* is complete.
- $\triangleright$  G is SC-consistent wrt some total order sc on G.E.

## SB example

## Store buffering (SB)

$$x = y = 0$$
  
 $x := 1$  |  $y := 1$   
 $a := y$  |  $b := x$ 





# Sequential consistency (Alternative)

## Definition (Modification order (aka coherence order))

mo is called a *modification order* for an execution graph G if  $mo = \bigcup_{x \in I \text{ or } mo_x} mo_x$  where each  $mo_x$  is a total order on  $G.W_x$ .

## Definition (Alternative SC definition)

An execution graph *G* is called **SC**-*consistent* if the following hold:

- ▶ *G* is complete
- ► There exists a modification order mo for G such that  $G.po \cup G.rf \cup mo \cup rb$  is acyclic where:
  - ▶  $\mathbf{rb} \triangleq G.\mathbf{rf}^{-1}; \mathbf{mo} \setminus \mathsf{id}$  (from-reads / reads-before)

# SB example

## Store buffering (SB)

$$x = y = 0$$
  
 $x := 1$  |  $y := 1$   
 $a := y$  |  $b := x$ 





# Equivalence

#### Theorem

The two SC definitions are equivalent.

## Proof (sketch).

## **Lamport SC** $\Rightarrow$ alternative SC:

- ► Take  $mo_x \triangleq [W_x]$ ; sc;  $[W_x]$ .
- ▶ Then,  $G.po \cup G.rf \cup mo \cup rb \subseteq sc.$

## Alternative $SC \Rightarrow Lamport SC$ :

► Take sc to be any total order extending  $G.po \cup G.rf \cup mo \cup rb$ .

# Relaxing sequential consistency

- ► SC is very expensive to implement in hardware.
- It also forbids various optimizations that are sound for sequential code.

What most hardware guarantee and compilers preserve is "SC-per-location" (aka *coherence*).

#### Definition

An execution graph *G* is called *coherent* if the following hold:

- ▶ *G* is complete
- ► For every location x, there exists a total order sc<sub>x</sub> on all accesses to x such that:
  - ▶ If  $\langle a, b \rangle \in [RW_X]$ ; G.po;  $[RW_X]$  then  $\langle a, b \rangle \in sc_X$
  - ▶ If  $\langle a,b\rangle \in [\mathbb{W}_X]$ ;  $G.\mathrm{rf}$ ;  $[\mathbb{R}_X]$  then  $\langle a,b\rangle \in \mathrm{sc}_X$  and there does not exist  $c \in G.\mathbb{W}_X$  such that  $\langle a,c\rangle \in \mathrm{sc}_X$  and  $\langle c,b\rangle \in \mathrm{sc}_X$ .

## Alternative definition of coherence I

SC: po  $\cup$  rf  $\cup$  mo  $\cup$  rb is acyclic

COH:  $po|_{loc} \cup rf \cup mo \cup rb$  is acyclic

#### Definition

Let mo be a modification order for an execution graph G. G is called *coherent wrt* mo if  $G.po|_{loc} \cup G.rf \cup mo \cup rb$  is acyclic (where  $rb \triangleq G.rf^{-1}$ ; mo \ id).

#### $\mathsf{Theorem}$

An execution graph G is coherent iff the following hold:

- ▶ *G* is complete
- ▶ G is coherent wrt some modification order mo for G.

# "Bad patterns" I

$$\begin{array}{ccc}
R_{x} \\
\text{rf} & \downarrow po \\
W_{x} & x := 1
\end{array}$$

no-future-read

#### Recall:

- ▶ W is either a write or an RMW.
- ▶ R is either a read or an RMW.

# "Bad patterns" II

$$x := 1 \quad || \quad a := x \text{ } /\!\! / 2$$

$$x := 2 \quad || \quad a := x \text{ } /\!\! / 2$$

$$a := x \text{ } /\!\! / 1$$

$$W_{x} \xrightarrow{\text{mo}} V_{x} \qquad W_{x} \xrightarrow{\text{rf}} R_{x}$$

$$\text{coherence-ww} \qquad \text{coherence-rw}$$

$$W_{x} \xrightarrow{\text{mo}} V_{x} \qquad W_{x} \xrightarrow{\text{rf}} p_{0}$$

$$R_{x} \qquad W_{x} \xrightarrow{\text{rf}} p_{0}$$

$$Coherence-wr \qquad \text{coherence-rr}$$

# "Bad patterns" III



In coherent executions, an RMW event may only read from its immediate mo-predecessor.

## Alternative definition of coherence II

#### Theorem

Let mo be a modification order for an execution graph G. G is coherent wrt mo iff the following hold:

- ▶ rf; po is irreflexive. (no-future-read)
- ▶ mo; po is irreflexive. (coherence-ww)
- ▶ mo; rf; po is irreflexive. (coherence-rw)
- $ightharpoonup rf^{-1}$ ; mo; po is irreflexive. (coherence-wr)
- $ightharpoonup rf^{-1}$ ; mo; rf; po is irreflexive. (coherence-rr)
- ▶ rf is irreflexive. (rmw-1)
- ▶ mo; rf is irreflexive. (rmw-2)
- $ightharpoonup rf^{-1}$ ; mo; mo is irreflexive. (rmw-atomicity)

# Examples (aka "litmus tests")

#### Coherence test

$$x = 0$$
 $x := 1$ 
 $a := x // 2$ 
 $x := 2$ 
 $b := x // 1$ 

## Store buffering

$$x = y = 0$$
  
 $x := 1$   $y := 1$   
 $a := y //0$   $b := x //0$ 

# **Atomicity**

#### Parallel increment

$$egin{aligned} x &= 0 \ a := extsf{FAA}(x,1) & b := extsf{FAA}(x,1) \end{aligned}$$

Guarantees that  $a = 1 \lor b = 1$ .

Can we implement locks in this semantics?

# Spinlock implementation

## Implementing locks?

Under COH, the spinlock implementation does not guarantee mutual exclusion.

## Message passing

More generally, COH is often too weak:

$$x = y = 0$$
  
 $x := 42$ ;  $\begin{vmatrix} a := y; \\ \text{while } \neg a \text{ do } a := y; \\ b := x \# 0 \end{vmatrix}$ 

$$x = y = 0$$
  
 $x := 42; \parallel a := y; //1$   
 $y := 1 \parallel b := x //0$ 

MP is a common programming idiom.

How can we disallow the weak behavior?

# Supporting message passing



#### Solution:

- Strengthen the notion of an "observed" write.
- ▶ In other words, make rf-edges "synchronizing."

# Release/acquire (RA) memory model

SC: po  $\cup$  rf  $\cup$  mo  $\cup$  rb is acyclic

COH:  $po|_{loc} \cup rf \cup mo \cup rb$  is acyclic RA:  $(po \cup rf)^+|_{loc} \cup mo \cup rb$  is acyclic

#### Definition

Let mo be a modification order for an execution graph G. G is called RA-consistent wrt mo if  $(po \cup rf)^+|_{loc} \cup mo \cup rb$  is acyclic for some modification order mo for G (where  $rb \triangleq G.rf^{-1}; mo \setminus id$ ).

#### Definition

An execution graph G is RA-consistent if the following hold:

- ► *G* is complete
- $\triangleright$  *G* is RA-consistent wrt some modification order mo for *G*.

# Alternative definition of RA consistency

#### Theorem

Let mo be a modification order for an execution graph G. G is RA-consistent wrt mo iff the following hold:

- ightharpoonup (po  $\cup$  rf)<sup>+</sup> is irreflexive. (no-future-read)
- ightharpoonup mo;  $(po \cup rf)^+$  is irreflexive. (coherence-ww)
- ▶  $rf^{-1}$ ; mo;  $(po \cup rf)^+$  is irreflexive. (coherence-wr)
- $ightharpoonup rf^{-1}$ ; mo; mo is irreflexive. (rmw-atomicity)

# Mixing the models

Revisit the MP idiom:

$$x := 42$$
  
 $y := 1$   $\begin{vmatrix} a := y \\ \text{while } \neg a \text{ do } a := y \\ b := x \# 0 \end{vmatrix}$ 

- We only need the last read of y to synchronize.
- ▶ Idea: introduce access modes.

$$x :=_{\mathsf{rlx}} 42$$
  $x :=_{\mathsf{rel}} 1$   $a := y_{\mathsf{rlx}}$   $a := y_{\mathsf{acq}}$   $a := y_{\mathsf{acq}}$   $b := x_{\mathsf{rlx}} \ \# 0$ 

## Happens-before

Each memory accesses has a mode:

- ► Reads: rlx, acq, or sc
- ► Writes: rlx, rel, or sc
- ► RMWs: rlx, acq, rel, acq-rel, or sc

"Strength" order 

is given by:

$$rlx \xrightarrow{acq} acq-rel \rightarrow sc$$

Synchronization:

$$G.sw = [W^{\supseteq rel}]; G.rf; [R^{\supseteq acq}]$$

Happens-before:

$$G.\mathtt{hb} = (G.\mathtt{po} \cup G.\mathtt{sw})^+$$

# Towards C/C++11 memory model

SC: po  $\cup$  rf  $\cup$  mo  $\cup$  rb is acyclic

COH:  $po|_{loc} \cup rf \cup mo \cup rb$  is acyclic RA:  $(po \cup rf)^+|_{loc} \cup mo \cup rb$  is acyclic C11:  $hb|_{loc} \cup rf \cup mo \cup rb$  is acyclic

## Definition

Let mo be a modification order for an execution graph G. G is called C11-consistent wrt mo if  $hb|_{loc} \cup rf \cup mo \cup rb$  is acyclic (where  $rb \triangleq G.rf^{-1}; mo \setminus id$ ).

#### Definition

An execution graph G is C11-consistent if the following hold:

- ► *G* is complete
- $\triangleright$  *G* is C11-consistent wrt some modification order mo for *G*.

# The C/C++11 memory model

The full C/C++11 is more general:

- Non-atomics for non-racy code (the default!)
- ► Four types of fences for fine grained control
- ▶ SC accesses to ensure sequential consistency if needed
- ► More elaborate definition of sw ("release sequences")

```
int a = 0;
int x = 0;
a = 42; || if(x == 1){
x = 1; || print(a);
```

```
int a = 0;
int x = 0;
a = 42; if(x == 1){
x = 1; race print(a);
}
```

```
int a = 0;
int x = 0;
```

```
int a = 0;
atomic_int x = 0;
```

```
int a = 0;
int x = 0;
```

```
int a = 0;
                                               atomic_int x = 0;
a = 42; if (x == 1) { a = 42; if (x_{rlx} == 1) { x = 1; race print(a); } x_{rlx} = 1; race print(a);
                                          x_{rlx} = 1; race print(a);
```

```
int a = 0;
int x = 0;
```

```
int a = 0:
                                                   atomic_int x = 0;
a = 42; if (x == 1) { a = 42; if (x_{rlx} == 1) { x = 1; race print(a); } x_{rlx} = 1; race print(a);
```

```
int a = 0;
 atomic_int x = 0;
```

```
int a = 0;
int x = 0;
```

```
int a = 0:
                                                   atomic_int x = 0;
a = 42; if (x == 1) { a = 42; if (x_{rlx} == 1) { x = 1; race print(a); } x_{rlx} = 1; race print(a);
```

```
int a = 0:
     atomic_int x = 0;
a = 42; | if(x<sub>acq</sub> == 1){
x<sub>rel</sub> = 1; | print(a);
```

```
int a = 0;
int x = 0;
```

```
int a = 0:
                                                   atomic_int x = 0;
a = 42; if (x == 1) { a = 42; if (x_{rlx} == 1) { x = 1; race print(a); } x_{rlx} = 1; race print(a);
```

```
int a = 0;
     atomic_int x = 0;
a = 42; if (x_{acq} == 1) { x_{rel} = 1; x_{sw} print(a);
```

```
int a = 0;
```

```
int a = 0;
int x = 0; atomic_int x = 0; a = 42; | if(x == 1) \{ a = 42; | if(x_{rlx} == 1) \{ x = 1; | x_{rlx} = 1; | race | print(a); \}
```

```
int a = 0:
atomic_int x = 0;
```

```
int a = 0:
         atomic_int x = 0;
```

```
int a = 0;
```

```
int a = 0;
int x = 0; atomic_int x = 0; a = 42; | if(x == 1) \{ a = 42; | if(x_{rlx} == 1) \{ x = 1; | x_{rlx} = 1; | race | print(a); \}
```

```
int a = 0:
atomic_int x = 0;
```

```
int a = 0:
         atomic_int x = 0;
```

```
int a = 0;
```

```
int a = 0;
int x = 0; atomic_int x = 0; a = 42; | if(x == 1) \{ a = 42; | if(x_{rlx} == 1) \{ x = 1; | x_{rlx} = 1; | race | print(a); \}
```

```
int a = 0:
atomic_int x = 0;
```

```
int a = 0:
         atomic_int x = 0;
```

# The "synchronizes-with" relation



$$\mathtt{sw} \triangleq \big([\mathtt{W}^{\supseteq \mathsf{rel}}] \cup [\mathtt{F}^{\supseteq \mathsf{rel}}]; \mathtt{po}\big); \mathtt{rf}; \big([\mathtt{R}^{\supseteq \mathsf{acq}}] \cup \mathtt{po}; [\mathtt{F}^{\supseteq \mathsf{acq}}]\big)$$



# Release sequences (RMW's)

$$x_{\text{rlx}} := 42;$$
  $y_{\text{rel}} := 1$   $a := \text{FAI}_{\text{rlx}}(y);$  // 1  $b := y_{\text{acq}};$  // 2  $c := x_{\text{rlx}};$  // 0

$$\mathtt{sw} \triangleq \big([\mathtt{W}^{\supseteq \mathsf{rel}}] \cup [\mathtt{F}^{\supseteq \mathsf{rel}}]; \mathtt{po}\big); \mathtt{rf}^+; \big([\mathtt{R}^{\supseteq \mathsf{acq}}] \cup \mathtt{po}; [\mathtt{F}^{\supseteq \mathsf{acq}}]\big)$$

#### "Catch-fire" semantics

### Definition (Race in C11)

Given a C11-execution graph G, we say that two events a, b C11-race in G if the following hold:

- $\triangleright$  a  $\neq$  b
- $\blacktriangleright \{ \mathsf{typ}(a), \mathsf{typ}(b) \} \cap \{ \mathsf{W}, \mathsf{RMW} \} \neq \emptyset$
- ▶  $na \in \{mod(a), mod(b)\}$
- $ightharpoonup \langle a,b\rangle \not\in hb$  and  $\langle b,a\rangle \not\in hb$

G is called C11-racy if some a, b C11-race in G.

# C11 consistency

#### Definition

Let  $\underline{mo}$  be a modification order for an execution graph G. G is called C11-consistent wrt  $\underline{mo}$  if:

- ▶  $hb|_{loc} \cup rf \cup mo \cup rb$  is acyclic (where  $rb \triangleq G.rf^{-1}; mo \setminus id$ ).
- ► ...sc... ?

#### Definition

An execution graph G is C11-consistent if the following hold:

- G is complete
- $\triangleright$  *G* is C11-consistent wrt some modification order mo for *G*.

### SC conditions

- ► The most involved part of the model, due to the possible mixing of different access modes to the same location.
- ► Changed in C++20
- ▶ If there is no mixing of SC and non-SC accesses, then additionally require acyclicity of  $hb \cup mo_{sc} \cup rb_{sc}$ .

#### **Further reading:**

- Overhauling SC atomics in C11 and OpenCL. Mark Batty, Alastair F. Donaldson, John Wickerson, POPL 2016.
- Repairing sequential consistency in C/C++11. Ori Lahav, Viktor Vafeiadis, Jeehoon Kang, Chung-Kil Hur, Derek Dreyer, PLDI 2017.

# Repaired SC condition for fences

```
\begin{array}{ll} \textbf{eco} \triangleq (\texttt{rf} \cup \texttt{mo} \cup \textbf{rb})^+ & (\texttt{extended coherence order}) \\ \texttt{psc}_F \triangleq [F^{\text{sc}}]; (\texttt{hb} \cup \texttt{hb}; \texttt{eco}; \texttt{hb}); [F^{\text{sc}}] & (\texttt{SC fence order}) \end{array}
```

#### Condition on SC fences

psc<sub>F</sub> is acyclic

## Example: SB with fences

$$x = y = 0$$
 $x_{rlx} := 1;$   $y_{rlx} := 1;$ 
fence(sc); fence(sc);
 $a := y_{rlx}; // 0$   $b := x_{rlx}; // 0$ 

## Exercise: ARC

Add access modes to make the following reference counting implementation correct under C11.

```
new(v){
                      clone(a){
                        FADD(a.count, +1);
  a = alloc():
  a.data = v:
  a.count = 1;
                      drop(a){
  return a;
                        t = FADD(a.count, -1);
                        if(t == 1){
read(a){
                          free(a);
  return a.data;
```

FADD = fetch\_and\_add